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Abstract. In this article we propose a method for solving unconstrained optimiza- 
tion problems with convex and Lipschitz continuous objective functions. By making 
use of the Moreau envelopes of the functions occurring in the objective, we smooth 
the latter to a convex and differentiable function with Lipschitz continuous gradient 
by using both variable and constant smoothing parameters. The resulting problem is 
solved via an accelerated first-order method and this allows us to recover approximately 
the optimal solutions to the initial optimization problem with a rate of convergence of 
order O(^p) for variable smoothing and of order for constant smoothing. Some 

numerical experiments employing the variable smoothing method in image processing 
and in supervised learning classification are also presented. 
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1 Introduction 



In this paper we introduce and investigate the convergence properties of an efficient 
algorithm for solving nondifferentiable optimization problems of type 

inf {/(x) + g(Kx)}, (1) 
xen 

where T~L and /C are real Bilbert spaces, / : T~L — > M and g : K, — > R are convex and 
Lipschitz continuous functions and the operator K : % — > K, is linear and continuous. 
By replacing the functions / and g through their Moreau envelopes, approach which can 

we approxi- 



be seen as part of the family of smoothing techniques introduced in 1 1 3 - 15 
mate ([!]) by a convex optimization problem with a differentiable objective function with 
Lipschitz continuous gradient. This smoothing approach can be seen as the counter- 
part of the so-called double smoothing method investigated in [5][6j[TT] , which assumes 
the smoothing of the Fenchel-dual problem to ([I]) to an optimization problem with a 

*Faculty of Mathematics, Chemnitz University of Technology, D-09107 Chemnitz, Germany, e-mail: 
radu.bot@mathematik.tu-chemnitz.de. Research partially supported by DFG (German Research Foun- 
dation), project BO 2516/4-1. 

^ Faculty of Mathematics, Chemnitz University of Technology, D-09107 Chemnitz, Germany, e-mail: 
Christopher . hendrich@mathematik. tu-chemnitz . de. 



1 



strongly convex and differentiable objective function with Lipschitz continuous gradient. 
There, the smoothed dual problem is solved via an appropriate fast gradient method 
(cf . [IB] ) and a primal optimal solution is reconstructed with a given level of accuracy. 
In contrast to that approach, which asks for the boundedness of the effective domains of 
/ and g, determinant is here the boundedness of the effective domains of the conjugate 
functions /* and g* , which is automatically guaranteed by the Lipschitz continuity of / 
and g, respectively. For solving the resulting smoothed problem we propose an exten- 
sion of the accelerated gradient method of Nesterov (cf. |17| ) for convex optimization 
problems involving variable smoothing parameters which are updated in each iteration. 
This scheme yields for the minimization of the objective of the initial problem a rate 
of convergence of order O(tt), while, in the particular case when the smoothing pa- 
rameters are constant, the order of the rate of convergence becomes 0{^). Nonetheless, 
using variable smoothing parameters has an important advantage, although the theo- 
retical rate of convergence is not as good as when these are constant. In the first case 
the approach generates a sequence of iterates (xk)k>l such that (f(xk) + g(Kxk))k>i 
converges to the optimal objective value of ([!]). In the case of constant smoothing vari- 
ables the approach provides a sequence of iterates which solves the problem (JIJ with an 
apriori given accuracy, however, the sequence (/(ajfc) + g(Kxk))k>i m &y n ot converge 
to the optimal objective value of the problem to be solved. 

In addition, we show, on the one hand, that the two approaches can be designed 
and keep the same convergence behavior also in the case when / is differentiable with 
Lipschitz continuous gradient and, on the other hand, that they can be employed also 
for solving the extended version of ([TJ 

mf j/(aO + f>(tfi*)j, (2) 



xeH 



where /Q are real Hilbert spaces, gi : — > R are convex and Lipschitz continuous 
functions and K{ : % — > /Q, i = 1, . . . , m, are linear continuous operators. 

The structure of this paper is as follows. In Section [2] we recall some elements of 
convex analysis and establish the working framework. Section[3]is mainly devoted to the 
description of the iterative methods for solving ([I]) and of their convergence properties 
for both variable and constant smoothing and to the presentation of some of their 
variants. In Section [4] numerical experiments employing the variable smoothing method 
in image processing and in supervised vector machines classification are presented. 



2 Preliminaries of convex analysis and problem formula- 
tion 

In the following we are considering the real Hilbert spaces T~L and fC endowed with the 
inner product (•, •) and associated norm ||-|| = \J (•, •). By B-^ C % and M++ we denote 
the closed unit ball of H and the set of strictly positive real numbers, respectively. The 
indicator function of the set C C % is the function 5c '■ T~L — > M := M U {±00} defined 
by $c( x ) = for x € C and Sc(x) = +00, otherwise. For a function / : H — > M we 
denote by dom/ := {x £ H : f(x) < +00} its effective domain. We call / proper if 
dom/ 7^ and f(x) > —00 for all x £ H. The conjugate function of / is /* : % — > M, 
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f*(p) = sup{(p,x) — /(x) : x £ Ti} for all p £ Ti. The biconjugate function of / is 
/** : Ji — > R, f**(x) = sup{(x,p) — /*(p) : p € ft} and, when / is proper, convex and 
lower semi continuous, according to the Fenchel-Moreau Theorem, one has / = /**. 
The (convex) subdifferential of the function / at x £ ft is the set df(x) = {p £ % : 
f(y) ~ f( x ) — (p>V — x ) S ft}, if /(x) £ R, and is taken to be the empty set, 
otherwise. For a linear operator K : ft — > JC, the operator K* : JC — > ft is the adjoint 
operator of if and is defined by (K*y, x) = (y, .fTx) for all x £ ft and all y £ JC. 

Having two functions /, g : ft — > R, their infimal convolution is defined by /□<? : 
M-fH, (/□y)(x) = inf„ 6 « {/(y) + y(x - y)} for all x G ft. When /,y : ft -5- I are 
proper and convex, then 

(/ + 5 )* = /*□«?* (3) 

provided that / (or g) is continuous at a point belonging to dom / n dom g. For other 
qualification conditions guaranteeing ^ we refer the reader to [3]. 

The Moreau envelope of parameter 7 G of a proper, convex and lower semicon- 
tinuous function / : ft — > M. is the function 7 / : ft — > M, defined as 

7 /(*) := /□ ll-f) (*) = mf {/(y) + ^||x - y|| 2 } Vx £ ft. 

For every x £ ft we denote by Prox 7 y(x) the proximal point of parameter 7 of / at x, 
namely, the unique optimal solution of the optimization problem 

„^{/M + ^lly-*ll 2 }. W 

Notice that Prox 7 j : ft — > ft is single- valued and firmly nonexpansive (cf. [T| Proposition 
12.27]), i.e., 

||Prox 7/ (x) -Prox 7/ (y)|| 2 + ||(x-Prox 7/ (x)) -(y-Prox 7/ (y))|| 2 < ||x-y|| 2 Vx, y £ ft, 

(5) 

thus 1-Lipschitz continuous, i.e., Lipschitz continuous with Lipschitz constant equal to 
1. We also have (cf. |l] Theorem 14.3]) 

7 /(z) + V(f ) = ^ Vx G ft (6) 

and the extended Moreau 's decomposition formula 

Prox 7/ (x) + 7Proxi ^ = x Vx £ ft. (7) 

The function 7 / is (Frechet) differentiable on ft and its gradient VC 7 /) : ft — > ft fulfills 
(cf. [I] Proposition 12.29]) 

Vp/)(x) = i(x-Prox 7/ (x)) VxGft, (8) 

being in the light of (|5j —Lipschitz continuous. For a nonempty, convex and closed 
set C C ft and 7 £ M++ we have that Prox 7< 5 c = Pes where Vc ■ H — > C, Vc{x) = 
argmin., gC . ||x — z\\, denotes the projection operator on C. 
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When f : H —> K is convex and differentiable having an Lv/-Lipschitz continuous 
gradient, then for all x,y £ 7~L it holds (see, for instance, [I 16 17] ) 



/(*/) </(*) + <V/(s),y-s) + 



L 



V / II l|2 

\\y ~ x \\ 



(9) 



The optimization problem that we investigate in this paper is 
(P) inU f(x) + g(Kx)}, 

where K : % — > K, is a linear continuous operator and / : "% — )■ K and gr : — >■ K are 
convex and Lj-Lipschitz continuous and L g -Lipschitz continuous functions, respectively. 
According to |2j Proposition 4.4.6] we have that 



dom/* C LfBu and domg* C L g B^. 



(10) 



3 The algorithm and its variants 
3.1 The smoothing of the problem (P) 

The algorithms we would like to introduce and analyze from the point of view of their 
convergence properties assume in a first instance an appropriate smoothing of the prob- 
lem (P) which we are going to describe in the following. 

For p £ M ++ we smooth / via its Moreau envelope of parameter p, p f : % —> M, 
p f(x) = (fUj^ IH| 2 ) ( x ) f° r every x £ T~L. According to the Fenchel-Moreau Theorem 
and due to pj), one has for x £ % 

'/(*) = (roi ihi^ {x) = [f* + p - n-ii 2 )* (x) = sup {<*,p> - r (?) - \ \\vt\ 

As already seen, p f is differentiable and its gradient (cf. (|8]) and (J7|) 

V('/) : ft -»• ft, V( p f) = \{x - Prox p/ (x)) = Prox^, Vx £ ft, 

is ^-Lipschitz continuous. 

For p £ R ++ we smooth go K via p goK: H^R, p go K(x) = (sO^j ||-|| 2 ) (^) 
for every x G ft. According to the Fenchel-Moreau Theorem and due to ([3]), one has 

p g o K{x) = (f*U±- ll-f) (Kx) = (V + f INI 2 )* (Kx) 
= sup \ (x, K*p) - g*(p) - - ||p|| 2 I Vi € 

The function p g o K is differentiable and its gradient V( M g o K) : ft — > ft fulfills (cf. Q 
and 0) 

V(^oif)(x) = K*X7( p g)(Kx) = ±K*(Kx-Pnx M (Kx)) = iTProxi , Vx e ft. 
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Further, for every x, y S 1~L it holds (see ([5 
HVCs o K)(x) - VCg o i^)(y)|| < I||K|| \\(Kx - Pro^ g (Kx)) - (Ky - Prox M (Ky))|| 

ll^ll 2 „ 

< f - y i 

a* 

which shows that V(^<7 o if) is — Lipschitz continuous. 

Finally, we consider as smoothing function for / + g o K the function F p,p :H M, 
FP'^^x) = p 'fix) + p g o K(x), which is differentiable with Lipschitz continuous gradient 
VF^ :H^U given by 



VF^fx) = Proxi (%) +K*Prox 1 (&) Vx G 

-J* \CJ T,9* \ V J 



having as Lipschitz constant L{p, p) := - + 



2 



i + iw 

p [I 

For p2 > Pi > and every x EH it holds (cf. (p~0|) ) 



Pl /(z) = sup i - f*(p) - ^ ||p|| 



< sup {(x,p)-r(p)-^||p|| 2 j+ sup ( p2 

^2 



which yields, letting p\ i (cf. [I] Proposition 12.32]), 

L 2 

P2 /(X)</(^)< P2 /(X)+P2^. 

Similarly, for /x 2 > p\ > and every y E /C it holds 

m 5(y)< M2 5(y) + (p 2 -w) : f , 

and 

L 2 

"^(y) <y(y) < P2 5(y) + P2^- 

Consequently, for p2 > pi > 0, p2 > A*i > and every x £ H we have 

L 2 L 2 

F P2,M2 (:c ) < < iM 2 (x) + (p 2 - + (p 2 - (11) 



and 



L 2 L 2 

F p2 '^ix) < Fix) <F p ^ix) +P2y+^Y' (12) 
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3.2 The variable smoothing and the constant smoothing algorithms 



Throughout this paper F : % — > K, F(x) = f(x) + g(Kx), will denote the objective 
function of (P). The variable smoothing algorithm which we present at the beginning 
of this subsection can be seen as an extension of the accelerated gradient method of 
Nesterov (cf. |17|) by using variable smoothing parameters, which we update in each 
iteration. 



Initialization : t\ = 1, yi = x G H, (pk)k>l, (Pk)k>i Q K ++ 

1 \\K\\ 2 
For k > 1 : L k = h - — — , 



Pk Pk 

Xk = Vk ~ -r- fPraxjL*, ( — 1 + K*Proxj_„», ^ A ' /a 



l + A /l + 4^ 



2 

y/c+i = x k + — (xfc - xfe-ij 



(Al) 



The convergence of the algorithm ( Al ) is proved by the following theorem 



Theorem 1. Let f : % —> K 6e a convex and Lf-Lipschitz continuous function, g : 
K-*Ro convex and L g -Lipschitz continuous function, K : % — > 1C a linear continuous 
operator and x* £ H an optimal solution to (P) . Then, when choosing 

1 1 

Pk = —r and p k = — V7c > 1, 
ak ok 



where a,b £ algorithm (Al) generates a sequence {x k ) k >\ Q satisfying 



F{x k+1 ) - F{x*) < 



2{a + b\\K\ 
k + 2 



\xq — x \\ + 



„ l|2 , 2(l+ln(fc+l)) (L) L 



k + 2 



(13) 



thus yielding a rate of convergence for the objective of order 0{ 



\nk\ 



Proof. For any i>lwe denote F k := F pk,tJ-k , p k : = (t k — l)(x k -i — x k ) and 

& := VF k (y k ) = Praxw, + iTProx , * (^) . 

Pk J \PkJ Mfc y V Pk J 

For any k > 1 it holds 

Pfc+i - = - l)(x fc - x k+1 ) - x k+1 

= {t k+ i - l)x k - t k+1 (y k+1 - -^—VF k+1 (y k+1 ) 
\ L k+l 

= Pk-x k + t ^VF k+ \y k+1 ) 



6 



and from here it follows 

* 1 1 2 



||pfc + l - X k+1 + X 

\\Pk -%k + x*\\ 2 + 2 (p k - x k + x*, 7^6+1 ) + ( "r 1 ^ ) ||6 

\ J-'k+l I \^k+l/ 



2 

2 

+11' 



\\Pk ~ X k + X*\\ 2 + -^±^ (Pjfe,&!+l) 



2 

2 



L k+l \ Cfe+1 / V-^fc+l 



II , *||2 , 2 (*fc+l - 1) / t v . 2 *fc+l , * t \ \ ( tk+1 \ lie 

II H r VPfe>Sfe+l) + 7 -yfe+l,6+i)+ 7 lie 

t>k+l ^k+l V-^fc+l/ 

Further, using Q, since = y fc+ i - 2^76+1 > Jt follows 

< i^Ofc+O + (6+1,^+1 - Vk+i) + ^ lkfc+i - y fe +i|| 2 



2 

fe+ir 



= F k+ \y k+l ) - — ||6 + if + ^y— 116+1 f 
^fe+l ^-kfc+i 

= F fe + 1 ( 2/fc+1 )-- 1 — ||6 + if (14) 

and, from here, by making use of the convexity of F k+1 , we have 
(x* -y fc+ i,6+i) < F h+1 (x*) -F k +\y k+1 ) 



F k+ \x*) - F k+ \x k+1 ) - - 1 — ||6+if Vfc > 1. (15) 

<*L k+ i 

On the other hand, since F k+1 (x k ) — F k+l (y k+ i) > (£ k +i,x k — y k +i}, we obtain 

U k+ i\\ 2 ^2L k+1 (F k+ \y k+1 )-F k+1 (x k+1 )) 

< 2L k+1 (F k+1 (x k ) - F k+1 {x k+l ) - (6+l^fe)) VA; > 1. (16) 

Thus, as t k+1 — tk+i = t\ and by making use of (11 ), for any k > 1 it yields 
lbfe+i - x k+1 + x*\\ 2 - \\p k -x k + x*\\ 2 

f (p fe ,6+i) + ^(i^+V) - F*W)) + %^±i || 6+1 f 

^fe+l ^k+l L'k+l 

^fc+i ^fc+i 

9 ^ - ^V) + (/>* - Pk+l)^ + (JMH ~ W+l)f ) 

2tl+l {F k +\x k+1 ) - F k +\x*)) 



L 



k+l 
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Lk+i 
Lk+i 



T 2 



Li 



F k {x k )-F k {x*)+p k ^+^ 



2t 



k±l {F k + i {xk+i) _ F k + i {xl) 
Lk+l 



By using (12) it follows that for any k > 1 



F fe (x fe ) - F fc (x*) + + /i fc ^ > F(x fc ) - F fc (x*) > F(x k ) - F(x*) > 0, 



thus 



||pfc + i - Xfc+i + x* || 2 - - x fc + x* 



2* 



<7|F*W-^*) + ft^ + M* 



2*1 
2*1 



r2 
*7 



2f 2 



L 



(F fe+i (x fc+1 ) - V)) 



fc+i 



7 Pk+i -rr + Hk+i ~rr 

L/k+l V 1 1 



L k+ i \ Pk+1 ^2 ^ k+1 ^2 



which implies that 



lbjfe+i ~ x k+i + x *\\ 2 + 



2t 



k+l / jpk+l 



Li 



L k 



+i 



F k+1 (x k+1 ) - F k+1 (x*) + Pk+l ^- + /ifc+i 



Li 



9+2 / r2 

2 , Zt k / 77A:/ \ T7k(*\ , „ f , .. ^9 



Li 



< \\pk-x k + x*\\ z + j± \F\ Xk ) - F\x*) + p k -L + n k 2 
Itk+i ( L'j 

+ Lk~^[ pk+1 Y + flk+1 Y- 



Making again use of (12) this further yields for any k > 1 

+2 



2/ 



fc+i 



Lk+i 
2t 



(F(x k+ i) - F(x*)) 



< 



L 

2t 



i±i (Wfo+i) - F fc+ V) + + + || pfc+1 _ Xk+l + x *f 

fc+l \ 2 2] 



L\ 



lV 



< -1 F^xi) - F V) + Pi^f + m^r + IIpi - xi + x 



A- 



s=l 



2* 



s+1 



(17) 
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Since x\ = y\ — -^-VF 1 (yi) and 

F 1 (x*) > F 1 {yi) + (vF 1 (yi),x* — y\ 



we get 



F\ Xl ) < F\ Vl ) + (vF 1 (y 1 ),x 1 -Vi) + y H^i " yiH 2 > 



2ti 



1 / 171/ 



Li 



F'{ Xl )-F l {x*) ) + \\pi-x l + x 



-*I|2 



< 2(xi - yi,a;* - yi) - ||xi - yi|| + ||xi-x*|| =||y 1 -x*|| = ||x - a; 



„*l|2 



and this, together with (17), give rise to the following estimate 

2t 2 k+1 t 

-*±i - < ||z - x*f + ^ (p s L? + \i a L 



(18) 



Furthermore, since tk+i > | + for any fc > 1, it follows that i^+i > which, along 

with the fact that = ^- + = (a + b \\K\\ 2 )k, lead for any k > 1 to the following 
estimate 



< 



< 



2(q + b||Kf)(fc + l) 
(fc + 2) 2 

2(a + 6||K|| 2 



1 1 a; - x* 



fe+i . fe+l . \ 



k + 2 



2 k+1 t f 

l|xo - x * l|2 + ¥T^g?( v 



( L l + I i_ 

a b 



Using now that ti-+i < 1 + i/% for any A; > 1, it yields that t^ + i < k + 1 for any k > 0, 
thus 



E^<E <1 + E/ 1 ^-1+/ id,= l + ln(fc + l). 

g=l S 8 = 1 S S = 2 JS ~ 1X Jl X 



Finally, we obtain that 



F(x k+1 ) - F(x*) < 



2(a + &||*T 



\\XQ ~ X*\\ 2 + 



k + 2 

which concludes the proof. 



2(l + ln(fc + l)) (L) L 



k + 2 



+ Vfc > 1, 
a b 



□ 



In the second part of this subsection we propose a variant of algorithm ( Al ) formu- 
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lated with constant smoothing parameters: 



Initialization : t\ = 1, yi = x$ £ H, p, p £ R++> (A2) 



L(p,p) = " + 



1 lli^ll 2 



P P 

-J_L , , , , 

L(p,/x) V p 1 \ P J p 9 \ P 

H 



For k > 1 : x k = y k — —, ^ (Vroxiy, ("'j + ^*Proxi g * ^ 



2 

, tk-1 

Vk+i = x k + — [x k - x k -i) 



Constant smoothing parameters have been also used in [11] and [5j[6] within the frame- 
work of double smoothing algorithms, which assume the regularization in two steps of 
the Fenchel dual problem to (P) and, consequently, the solving of an unconstrained op- 
timization problem with a strongly convex and differentiable objective function having 
a Lipschitz continuous gradient. 

Theorem 2. Let f : H — > R be a convex and Lf -Lipschitz continuous function, g : 
K-*Ra convex and L g -Lipschitz continuous function, K : % — > K, a linear continuous 
operator and x* £% an optimal solution to (P) . Then, when choosing for e > 

2e , 2e 
P= 3L7 f am ^ = 3L|' 



algorithm (A2) generates a sequence (x k )k>l Q 7~L which provides an e-optimal solution 
to (P) with a rate of convergence for the objective of order O(r)- 

Proof. In order to prove this statement, one has only to reproduce the first part of the 
proof of Theorem [T] when 

1 IIKII 2 

p k = p,p k = p and L k = L(p, p) = — I VA; > 1, 

P V 



fact which leads to (18). This inequality reads in this particular situation 

F(* t+1 > - W < + 4±^I £ V* > 1. 

ZI fc+l /r fc+l s= l 

Since t\ +1 = t\ + i^+i for any /c > 1, one can inductively prove that = J2s=±t 
which, together with the fact that t k +i > f° r an y k > 1, yields 



In order to obtain e-optimality for the objective of the problem (P), where e > is a 

m and f = m 



given level of accuracy, we choose p = -M? and p = ^% and, thus, we have only to force 
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the first term in the right-hand side of the above estimate to be less than or equal to 

3L 2 +3L 2 ||-R"|| 2 

I. Taking also into account that in this situation L(p, jj) = — - — ^ , it holds 



2c 



e 2L(p,n)\\x -x*\\ 2 3{L f + L g \\K\\")\\r lt -.< 



> 



*l|2 



O — > 



3" (k + 2) 2 

2(L} + Lj\\K\\* 



e(fc + 2)2 



\Xq — X 



{k + 2) 



^L} + Ll\\K\\2\\x -x* 
3 - yfc + 2 



which shows that an e-optimal solution to (P) can be provided with a rate of convergence 
for the objective of order 0{ ^). □ 



The rate of convergence of algorithm (Al) may not be as good as the one proved 



for the algorithm with constant smoothing parameters depending on a fixed level of 
accuracy e > 0. However, the main advantage of the variable smoothing methods is 
given by the fact that the sequence of objective values (f(xk) + g(Kxk))k>i converges to 



the optimal objective value of (-P), whereas, when generated by algorithm ( A2 ), despite 
of the fact that it approximates the optimal objective value with a better convergence 
rate, this sequence may not converge to this. 



3.3 The case when / is differentiable with Lipschitz continuous gradi- 
ent 



In this subsection we show how the algorithms (Al) and (A2) for solving the problem 



(P) can be adapted to the situation when / is a differentiable function with Lipschitz 
continuous gradient. We provide iterative schemes with variable and constant smoothing 
variables and corresponding convergence statements. More precisely, we deal with the 
optimization problem 



(P) 



M{f(x) + g(Kx)}, 



where K : % — > K is a linear continuous operator, / : % — > M is a convex and differen- 
tiable function with L\j j-Lipschitz continuous gradient and g : K, — > M is a convex and 
L ff -Lipschitz continuous function. 



Algorithm ( Al ) can be adapted to this framework as follows: 



Initialization : t 



For k> \ : Li 



1, VI = x G H, (/ifc)fc>i C R ++ 

|2 



Vk 



K\ 



/'A- 



f (Vf(y k ) + K *Prox iff . 
L>k \ M fe y \ pk 



tk+l ' 
Vk+i 



1 + Ji + ul 



Xk + 



2 

tk ~ 
tk-\ 



1 



(x k - X k -l) 



(A3) 
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while its convergence is furnished by the following theorem. 

Theorem 3. Let f : T~L — > R be a convex and differentiable function with L\j j-Lipschitz 
continuous gradient, g : K, — > R a convex and L g -Lipschitz continuous function, K : 
H — > K, a nonzero linear continuous operator and x* G % an optimal solution to (P). 
Then, when choosing 

Ilk = TT > 1, 



where b G R++, algorithm (A3) generates a sequence (x k ) k >i C Ti satisfying for any 
k > 1 

F{XM) _ F(I .) < W«r + HKf) „ X0 _ x , f + 2(i + Hk + 1 ))LHL vf + mf) 

v k+u v ' ~ k + 2 11 11 A; + 2 62 ||jq2 

(19) 

i/ras yielding a rate of convergence for the objective of order O(^p). 

Proof. For any k > 1 we denote by F k : T~L — > R, F k (x) = f(x) + ^ k g{Kx). For any 
fc > 1 and every x G U it holds VF fc (x) = V/(x) + iTTroxj_ . (f^) and VF fc is 

I A" 1 1 2 

Lfc-Lipschitz continuous, where L k = Lyj + 11 ^ . 

As in the proof of Theorem [TJ by defining pp. := (t k — l)(xfc_i — we obtain for 
any fc > 1 

\\Pk+i ~ Xk+i + a;*|| 2 - \\Pk ~x k + x*\\ 2 

^ k ' ■ 1 ■ — 1 • \ ? 

~ L k+ 

< P- F\x k ) - F k+1 (x*) + ( Mfc - mi ) f - P±i(F fe+1 (^ + l) - F k +\x*)) 

L k+1 \ * ) L k+1 

9+2 / r,2\ o+2 .2 

< p. f*^) _ FV) + - y^-F^Wl) " V)) - y^+i^ 

\ ^ / L k+1 L k+1 

< ^ (&{x h )-F*{x*) + j£\ - ^(F w (x w )-F W (^)) - f^ilj 

L k \ & J L k+1 L k+1 

97-2 / r2\ 9/ 2 



- (F fe +V fc ) - F fc + V)) " ~p^{F k +\x k+l ) - F k+ \x*)) 



F fc (x fc ) - F fc (z*) + ^ - _M* (F*+i (a . fc+l) - F fc + V)) 



Lk \ 2 y Lfc+i 

j + —? Pk+l 

and, consequently, 

||p fc+1 - x k+1 + x*\\ 2 + |j±± ^F k+l {x k+l ) - F k+l {x*) + Vk+i^j 
< || pfc _ Xfc + x *f + 2 A ( F k {x k ) - F k (x*) + fiS) + **^W. 

-kfc V * I L k+1 
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For any k > 1 it holds 



9+2 

-*±I (F(x k+1 ) - F(x*)) 

9+2 / £2\ 

< ^±1 F k +\x k+1 ) - F k+1 (x*) + + ||p fc+1 - x k+1 + x*\\ 2 

L k+l \ ^ ) 

< f- [F 1 ^) - F\x*) + ^ \ + \\ Pl - x 1+ x*\\ 2 



ts+iL 2 , 
L s +i 



s=l 

which yields 

n + 2 fc+i + t2 

S = l 



(F(x k+l ) - F(x*)) < \\x - x*\\ 2 + ]T -^W (20) 



For any fc > 1, since t k+1 > ^ and L k = Ly f + = L v/ + 6 ||K|| 2 fc, it follows 

F(x fc+ i) - F(s*) 

2(L vf + b\\Kf(k + l)) („_ ^ l|2 , ^ \ 



< 



( fe+i 
||* -*l 2 + £ 

V s=l 



( fc + 2 ) 2 V ' ti{Lv f + b\\K\\ 2 s)sb / 

Thus, for any k > 1, since i& < fc, it yields 

F(xfc+i) - F(x*) 

2(L v/ + 6||K|| 2 (fc + l)) /, . I|2 , K ^ 2 



9 



^V/^ll^l V^-r^ [|xo - x*[| 2 + V ^ 

( fc + 2 ) 2 V s tl(^v/ + 6||^l| 2 ^) 

< ^/+ ( ;n + y + i)) ^_^ ir+ g 



T 2 
^9 



< 



(fc + 2) 2 

2(L v/ + &||if|| 2 ) 
k + 2 



9 (l+ln(fc+l)) 



1 1 so - £*|| 2 + 



L 2 N 
3 r (l + ln(A: + l)) 



6 2 ||K|| 2 / 
2 (l + ln(A: + l))L2(Lv / + 6||K|| 2 ) 



A; + 2 



fo 2 ii^ir 



□ 



By adapting (A3) to the framework considered in this subsection we obtain the 
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following algorithm with constant smoothing variables: 



Initialization : t\ = 1, y\ = xq £ H, /j£ 



L(y«) = L V f + 



For k > 1 : x fc = yk 



Lfjj) 



Vf(y k ) + iTProxi, 



1 + ^/1 + 4*2 



2 

tfc-1 



(x k - X k -l) 



Ky k \ 
V J 



(A4) 



The convergence of algorithm (A4) is stated by the following theorem, which can be 
proved in the lines of the proof of Theorem [3j 

Theorem 4. Let f : % — ?► IR be a convex and differentiable function with L\r f-Lipschitz 
continuous gradient, g : K, — > IR a convex and L g -Lipschitz continuous function, K : 
T~L — > /C a nonzero linear continuous operator and x* £ H an optimal solution to (P). 
Then, when choosing for e > 



algorithm (A4) generates a sequence (xk)k>i Q T~L which provides an e-optimal solution 
to (P) with a rate of convergence for the objective of order 0(h)- 



3.4 The optimization problem with the sum of more than two func- 
tions in the objective 

We close this section by discussing the employment of the algorithmic schemes presented 
in the previous two subsections to the optimization problem ^ 

m£ j/(x)+f>(i^)j, 

where T~L and /Q, i = l,...,m, are real Hilbert spaces, / : T~L — > K is a convex and 
either L y-Lipschitz continuous or differentiable with L v/-continuous gradient function, 
gi : ICi — > IR are convex and L 9i -Lipschitz continuous functions and K{ : % — > fCi, 
i = 1, ...,m, are linear continuous operators. By endowing K := K,\ X ... x /C m with the 
inner product defined as 

m 

(y> z ) = ^2(Vh Zi) Vy, z £ JC, 
1=1 

and with the corresponding norm and by defining g : fC — > R, g(yi, y m ) = YaLi 9i(Vi) 
and K : % — > JC, Kx = (K±x, K m x), problem ^ can be equivalently written as 



inf {f(x) + g(Kx)} 
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and, consequently, solved via one of the variable or constant smoothing algorithms 
introduced in the subsections 3.2 and 3.3 depending on the properties the function / 
is endowed with. 

In the following we determine the elements related to the above constructed function 
g which appear in these iterative schemes and in the corresponding convergence state- 
ments. Obviously, the function g is convex and, since for every (yi, y m ), (z\, z m ) G 
K 

m / m \ 5 

\g{yi,-~,ym)-g(zi,...,z m )\ < J2 L g t \\vi- z i\\ ^ E L <L II C2/1 , -,y m ) - Oi, -,z m )\\, 



i=i 



\i=l 



it is (j2i^i L 2 g ^j 2 -Lipschitz continuous. On the other hand, for each \i G 
(2/1 ! —,y m ) e /C it holds 

in 

"9(yi,-,y m ) = J2 fl 9i{yi), 



and 



thus 



V(^)( yi , ...,y m ) = WgtXvi), W9m)(y m )) 



^Proxi „» 



m 

M 9 ' V A* 



, ...,Proxi » 



2/m 
A* 



Since iT*(yi, ...,y m ) = X)i=i for ever y 0/1, — ,2/m) G w e have 

m 

v(^ok)(x) = K*v(^)(^ ia; ,...,ic m x) = ^^;v(^)(^) 



i=l 



Finally, we notice that for arbitrary x, y £ H one has 



\V^goK)(x)-V^goK)(y)\\ 



8=1 



i=l 



< £ II^H ||V(^)(^) - V(»9i)(Ki 



< 



i=i 



£ 

i=l 



\_Ki_ 
A* 



- Kiy\\ - 



YZLi \\Ki 



11 



\x - y\\ , 



which shows that the Lipschitz constant of V(^<7 o K) is 



5n 



4 Numerical experiments 
4.1 Image processing 

The first numerical experiment involving the variable smoothing algorithm concerns the 
solving of an extremely ill-conditioned linear inverse problem which arises in the field 
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of signal and image processing, by basically solving the regularized nondifferentiable 
convex optimization problem 



inf 

xeR" 



\Ax - frllj + A llWxIU, 



(21) 



where b E 1" is the blurred and noisy image, A : R n — > W 1 is a blurring operator, 
W : R n — > W 1 is the discrete Haar wavelet transform with four levels and A > is the 
regularization parameter. The blurring operator is constructed by making use of the 
Matlab routines imf ilter and fspecial as follows: 



H=fspecial ('gaussian',9,4); % gaussian blur of size 9 times 9 

% and standard deviation 4 
B=imf ilter (X,H, 'conv ' , ' symmetric ' ) ; % B=observed blurred image 

% X=original image 



The function fspecial returns a rotationally symmetric Gaussian lowpass filter of size 
9x9 with standard deviation 4, the entries of H being nonnegative and their sum 
adding up to 1. The function imf ilter convolves the filter H with the image X 
and furnishes the blurred image B. The boundary option "symmetric" corresponds to 
reflexive boundary conditions. Thanks to the rotationally symmetric filter H, the linear 
operator A defined via the routine imf ilter is symmetric, too. By making use of the 
real spectral decomposition of A, it shows that ||^4|| 2 = 1. Furthermore, since W is an 

1 1 1 1 2 

orthogonal wavelet, it holds \\W\\ =1. 



The optimization problem (21) can be written as 



inf {f(x) + gi(Ax) + g 2 (Wx)}, 

where / : R n — > R is taking to be / = with the Lipschitz constant of its gradient 
L\rj = 0, <7i : M n —> R, gi(y) = \\y — b\\i is convex and -y/n-Lipschitz continuous and 
<72 : R" — > R, 92(y) = A \\y\\i is convex and A-^/n-Lipschitz continuous. For every p G R n 
it holds g\{p) = +p T b and g%{p) = £[-a,a]"(p) ( see ) f° r instance, |3|). We 

solved this problem, by using also the considerations made in Subsection |3.4| with 
algorithm (A3) and computed to this aim for p £ R++ and x € R n 



Prox ^* (tt) 



arg mm 



arg mm 

P e[-i,i] n 



arg mm 

pe[-l,l] n 



1 */ N 1 

-9i(p) + 2 




Ax 








1 


Ax 




P 


I 


= arg min < 




P 


') 






pe[-i,i] n 




2 







Ax 



2p? 



+ 



p + 2p 2 
(Axfb 




and 



Prox i 



02 



Wx\ 
V ) 



arg mm 



-9*2 (P) + \ 



Wx 



p 



P 



arg mm 

P6[-A,A] B 



Wx 



p 



p 



Wx 
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Hence, choosing = \, for some parameter a £ M++ and taking into account that 



L, 



becomes 



2ak, for k > 1, the iterative scheme (A3) with starting point 6 £ 



Initialization : t± = 1, yi = xq = ^ € a > 0, 



For k > 1 : fj-k 



1 



ak 
Xk = Uk 



l + ,/l + 4^ 



2 

tfc-1 
tk+i 



We considered the 256x256 cameraman test image, which is part of the image processing 
toolbox in Matlab, that we vectorized (to a vector of dimension n = 256 2 = 65536) and 
normalized, in order to make pixels range in the closed interval from (pure black) 
to 1 (pure white). In addition, we added normally distributed white Gaussian noise 
with standard deviation 10 -3 and set the regularization parameter to A = 2e-5. The 



original and observed images are shown in Figure 4.1 When measuring the quality of 



original 



blurred and noisy 





Figure 4.1: The 256 x 256 cameraman test image 

the restored images, we made use of the improvement in signal-to-noise ratio (ISNR), 
which is defined as 



ISNR fc = 10 log 



10 



Xk\ 



where x, b and x k denote the original, the observed and the estimated image at iteration 
k > 1, respectively. We tested several values for a G M ++ and we obtained after 100 
iterations the objective values and the ISNR values presented in Table |4.1[ In the 



context of solving the problem (21) we compared the variable smoothing approach 
(VS) for a = le-1 with the operator-splitting algorithm based on skew splitting (SS) 
proposed in J8 , 10 with parameters e = 2 (j2+i) anc ^ Ik = 1 = ^ + for any k > 1, 
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a 


le-4 


le-3 


le-2 


le-1 


1 


le+1 


le+2 


le+3 


fval 


164.621 


80.915 


55.763 


53.669 


53.579 


63.754 


208.413 


531.022 


ISNR 


1.282 


3.839 


5.241 


5.352 


5.337 


4.351 


1.180 


0.199 



Table 4.1: Objective values (fval) and ISNR values (higher is better) after 100 iterations. 



and with the primal-dual algorithm (PD) from |9j with parameters 9 = 1, a = 0.01 
and r = 49.999. The parameters considered for the three approaches provide the best 



results when solving (21). The output of these three algorithms after 100 iterations, 



PD 100 = 124.109283 



SS 10Q = 256.427780 



VS 1Q0 = 53.668543 




Figure 4.2: Results furnished by the primal-dual (PD), the skew splitting (SS) and the 
variable smoothing (VS) algorithms after 100 iterations. 



along with the corresponding objective values, can be seen in Figure 4.2 and they show 



that the variable smoothing approach outperforms the other two methods. Figure |4.3 
shows the evolution of the values of the objective function and of the improvement in 
signal-to-noise ratio within the first 100 iterations. 



4.2 Support vector machines classification 

The second numerical experiment we consider for the variable smoothing algorithm 
concerns the solving of the problem of classifying images via support vector machines 
classification, an approach which belong to the class of kernel based learning methods. 

The given data set consisting of 5268 images of size 200 x 50 was taken from a real- 
world problem a supplier of the automotive industry was faced with by establishing a 
computer-aided quality control for manufactured devices at the end of the manufactur- 
ing process (see |4] for more details on this data set). The overall task is to classify fine 
and defective components which are labeled by +1 and —1, respectively. 

The classifier functional f is assumed to be an element of the Reproducing Kernel 
Hilbert Space (RHKS) H K , which in our case is induced by the symmetric and finitely 
positive definite Gaussian kernel function 



k : 



k(x, y) = exp 



\x - y\ 
2a 2 
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Figure 4.3: The evolution of the values of the objective function and of the ISNR for the 
primal-dual (PD), the skew splitting (SS) and the variable smoothing (VS) algorithms 
after 100 iterations. 

Let (-, -) K denote the inner product on H K , \\ ■ || K the corresponding norm and K G M. nxn 
the Gram matrix with respect to the training data set Z = {(X\, Yi), . . . , (X n , Y n )} C 
M. d x {+1,-1}, namely the symmetric and positive definite matrix with entries Kij = 
n(Xi,Xj) for i,j = 1, . . . ,n. Within this example we make use of the hinge loss v : 
1 x 1 -j 1, v(x,y) = max{l — xy, 0}, which penalizes the deviation between the 
predicted value f (x) and the true value y G {+1,-1}. The smoothness of the decision 
function f G H K is employed by means of the smoothness functional £1 : % K — > M, 
£l(f) = ||f ||^, taking high values for non-smooth functions and low values for smooth 
ones. The decision function f we are looking for is the optimal solution of the Tikhonov 
regularization problem 



inf \~Sl(f) + Cpv(f(Xi),Yl)j, (22) 



ten 



where C > denotes the regularization parameter controlling the tradeoff between the 
loss function and the smoothness functional. 

The representer theorem (cf. [18| ) ensures the existence of a vector of coefficients 
c = (ci, . . . , c n ) T G W 1 such that the minimizer f of (22 ) can be expressed as a kernel ex- 



pansion in terms of the training data, i.e., f (•) = Ya=i c i K (^Xi). Thus, the smoothness 
functional becomes ri(f ) = ||f|| K = (f,f) K = Ya=i 2~Zj=i CiCj K (X{, Xj) = c T Kc and for 
i = 1, . . . ,n, it holds f (Xj) = Y^=i Cj K (Xi, Xj) = (Kc)i. Hence, in order to determine 
the decision function one has to solve the convex optimization problem 

where / : R n -> R, /(c) = \c T Kc, and g { : R n -> M, ^(c) = Cv{c u Yi) for i = 1, . . . , n. 
The function / : M n — > R is convex and differentiate and it fulfills V/(c) = Kc for 
every c G M. n , thus V/ is Lipschitz continuous with Lipschitz constant Lyj = \\K\\. 
For any i = l,...,n the function : W 1 — > R is convex and C-Lipschitz continuous, 



properties which allowed us to solve the problem (23) with algorithm (A3), by using 
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also the considerations made in Subsection 3.4 For any i = l,...,n and every p 
(pi, ..;Pn) T € M n it holds (see, also, Q@) 

S*(p) = sup {(p,c) - C^q,^)} = C sup (^c)- v(ci,ii) 
ceR n ceM n I \ ^ / 

+ oo, otherwise, 
( piYi, iipj = 0, i^j and G [-C,0], 
[+00, otherwise. 

Thus, for \i G c = (ci, c n ) T and i = 1, n we have 



Prox: 



■9- 



argmin \ -g*(p) + ~ 



/i 



PiYi 1 /'c, 



argmin 
PiKi6[-C,0] [ M 2 VA 1 



For K = 1 we have 



Proxi 



arg min < pj + 



arg min < piYi + 

Piii6[-C,0] [ 



A* A c 



2 Vju 



VAV piYiEl-Cfi] 
while for Yi = —1, it holds 



2 V/x 



0,...,P[-c,o] 



Proxi » 

VAV pi^el-co] 



argmin ^ -p^ + - ^ p, 



0,...,P[o,c] 



Summarizing, it follows 
Prox i „» 



M 8 \/i 



0, • • • ,V Yi [-c,o] 



Ci Yi 

A* 



Ci — 1 



Cj + 1 



r 



, . . . , W I , 



Thus, for every c = (ci, ...,c n ) T we have 

(n \ n n , 

o if) (c) = E V ^ o K)(c) = £ tfProx^ 
i=l / i=l i=l 1 \ 

/(Kc)i-yi 



,0 



^ ( 7Vi[-C,0] 



> -)^y„[-c,o] ( 



/(iT C ) n -y n ^ 



V A* 

Using the nonexpansiveness of the projection operator, we obtain for every c, d £ 

{groK)\{a) S\\K\\ 

\i=l / \i=l ) 



v X>f°*Q ( c )-v X>f°iO (d) 



A* 



<KC|e-d| 
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Choosing jjL k = \, for some parameter a G K++ and taking into account that L k 



' r 2 

iT|| + ak \\K\\ , for fc > 1, the iterative scheme (A3) with starting point xq = G 



71 



becomes 



Initialization : t\ = 1, y\ = xq = G M n , a G 

1 2 

For fe > 1 : Ufc = — , L k = \\K\\ + aA; \\K\\ , 
ak 

Xk = Vk~ j- \ Ky k + K (v Yi [-c,o] y 




1 + yfl + Ul 

*fe+i — Q ' 

= x k + ^ 1 (x k - x fc _i) 




Figure 4.4: Example of two fine and two defective devices. 

Coming to the real-data set, we denote by V = {(Xi, Yi),i = l,..., 5268} C M 10000 x 
{+1, —1} the set of all data available consisting of 2682 images of class +1 and 2586 



images of class —1. Notice that two examples of each class are shown in Figure 4.4 



Due to numerical reasons, the images have been normalized (cf. (l2l) by dividing each of 

i 

them by the quantity ( g26g X)i=i ll-^ill ) • We considered as regularization parameter 



a 


le-5 


le-4 


le-3 


le-2 


le-1 


1 


le+1 


le+2 


le+3 


err 


0.4176 


0.3037 


0.2278 


0.2468 


0.3986 


0.5315 


0.5125 


1.5945 


48.9561 



Table 4.2: Average classification errors in percentage. 

C = 100 and as kernel parameter a = 0.5, which are the optimal values reported in [4] 
for this data set from a given pool of parameter combinations, tested different values 
for a G K ++ and performed for each of those choices a 10-fold cross validation on T>. 
We terminated the algorithm after a fixed number of 10000 iterations was reached, the 



average classification errors being presented in Table 4.2 For a = le-3 we obtained 
the lowest missclassification rate of 0.2278 percentage. In other words, from 527 images 
belonging to the test data set an average of 1.2 were not correctly classified. 
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